VRVis – ComVis – Phone

VAST 2008 Challenge
Mini Challenge 3:  Cell Phone Calls 

Authors and Affiliations:

 

      Zoltan Konyha, VRVis,  konyha@vrvis.at [PRIMARY contact]
      Kresimir Matkovic, VRVis, matkovic@vrvis.at
      Wolfgang Freiler, VRVis,
freiler@vrvis.at
     
Denis Gracanin, Virginia Tech, gracanin@vt.edu
     
Ranko Miklin , University of Zagreb, r.miklin@gmail.com
     
Tomislav Lipic, University of Zagreb, tomislav.lipic@fer.hr
     
Mario Beric, University of Zagreb, Mario.Beric@fer.hr

 

Student team: NO

 

Tool(s):

 

We have used ComVis (http://www.comvis.at) in our analysis. ComVis is an interactive visualization application using multiple linked views. It offers numerous types of linked views for scalar, categorical and time series data. It also supports composite brushing. Brushes defined in the same view or in different views can be composited using boolean operators. The brushed set of items can be displayed in tabular format, too. We have used a simple Python script to compute various aggregates from the data set, including sets of contacts for each person and the Python package NetworkX (https://networkx.lanl.gov/) to get an overview of the network graph properties.

 

 

Two Page Summary:   YES

 

        VRVis-ComVis-Phone-Summary.pdf

 

 

ANSWERS:


Phone-1: What is the Catalano/Vidro social network, as reflected in the cell phone call data, at the end of the time period  

   PhoneNodes.txt

   PhoneLinks.txt

 


Phone-2  Characterize the changes in the Catalano/Vidro social structure over the ten day period.

Detailed Answer:

In short, the five persons mentioned in the challenge started using new phone numbers after day seven and they moved to different locations. This video.captures a part of our analysis.

 

Analysis with multiple linked views

 

The set of linked views shown in Figure 1 provide an overview of the caller, the callee, the date and time of the call and the location of the caller. Each point in the 20x20 matrices represents callers (top left) and callees (top right). In the bottom left, a scatter plot of the latitude/longitude coordinates of the towers provides a map. In the bottom right, a scatter plot of days (horizontal axis) and time (vertical axis) provides a calendar.

 

Figure 1: Contacts of ID200.

Figure 2: Phone calls between IDs 200 and 5.

 

The description of the mini challenge says that we have medium confidence that Ferdinando Catalano is identifier 200 and he would call Estaban Catalano most frequently. We brushed ID200 in the caller matrix in Figure 1. The linked callee matrix and the detail table indicate that he called IDs 1, 2, 3, 5, 97 and 137.

 

We brushed each of the highlighted callees. The logical AND of the two brushes selects calls from ID200 to the individual callees. The highlighted points in the calendar show when the calls were made. A snapshot of this process is shown in Figure 2. ID200 calls ID5 the most often (once every day in the first days). There are also seven calls from ID5 to ID200 with a similar temporal pattern. We concluded that ID5 is Estaban Catalano.

 

ID200 did not make any calls to 1, 2, 3 and 5 on the last three days. We checked if anyone else did. The logical OR combination of the two brushes in the callee matrix in Figure 3 selects records where those numbers were called. All but one of those calls were made in the first seven days. We also noticed that they made no calls on days 8 and 9.

 

Figure 3: Calls to IDs 1, 2, 3 and 5.

Figure 4: Number of people calling contacts of ID200.

 

What we learned from aggregates

 

Using a Python script, we computed aggregates for each of the 400 persons, including the number of incoming and outgoing calls, number of IDs called by the person, number of IDs calling the person and time series that indicates the number of phone calls in each hour. The time series gave us some valuable insight into the temporal patterns, which is discussed in the summary.

 

In Figure 4 we added a scatter plot showing the number of people calling a given person (top right) and one showing the number of people called by a given person (bottom right). We brushed the six contacts of ID200. The top right scatter plot and the detail table show that IDs 1 and 5 have received calls from many people. This is typical for someone coordinating a network. We know that ID5 is Estaban Catalano. Therefore we assume that ID1 is David Vidro. IDs 2 and 3 also received calls from many people, while 97 and 137 have fewer contacts. IDs 2 and 3 are Juan Vidro and Jorge Vidro, but we cannot decide which is which.

 

Figure 5: ID0 talks to many people.

Figure 6: IDs 306, 309, 360 and 397 have many contacts, too.

 

Figure 5 shows that ID0 called the most people and received calls from many different people. ID0 is an important node in the network, but we do not know the associated name. In Figure 6 we have brushed four more persons who received calls from many different people. The detail view displays their IDs: 306, 309, 360 and 397. They made calls in the last three days only.

 

Figure 7: People who had many contacts in the first seven days.

Figure 8: People who had many contacts in the last three days.

 

We suspect that something changed after day seven. We created separate aggregates for the first seven and the last three days. In Figure 7 we brushed IDs that had received calls from many contacts only in the first seven days, but not in the last three. They are IDs 1, 2, 3 and 5. Conversely, Figure 8 shows that IDs 300, 306, 309, 360 and 397 were called by many people in the last three days but not in the first seven.

 

Figure 9: Histogram of the number of common contacts.

We computed the common contacts for each pairs of the IDs 0, 1, 2, 3, 5, 13, 200, 300, 306, 309, 360 and 397. In Figure 9, we can see that the pairs 1 and 309, 2 and 397, 3 and 360, 5 and 306 have many common contacts. One ID in each pair was active on the first seven days while the other one was active on the last three. We suspect that after day seven the persons using IDs 1, 2, 3 and 5 started using the numbers 309, 397, 360 and 306, respectively.

 

 

The only common contacts of IDs 1, 2, 3 and 5 are 0 and 200. ID0 has mostly the same partners before and after day seven, thus we assume it belongs to the same person. IDs 306, 309, 360 and 397 are the same four people as IDs 1, 2, 3 and 5. Their only common contact is ID300 who also becomes active on the last three days only. Therefore, we assume that ID200 became ID300 after day seven. IDs 1, 2, 3, 5 and 200 talk to people in the last three days they have not (often) talked to before, therefore we assume that different people started using those phones. The following table summarizes the changes in the network:

 

Name

ID on days 1-7

ID on days 8-10

Ferdinando Catalano

200

300

Estaban Catalano

5

306

David Vidro

1

309

Jorge Vidro or Juan Vidro

2

397

Jorge Vidro or Juan Vidro

3

360

 

We studied the locations of towers those ten numbers were calling from to get an idea of the geographical extents of the movement. In general, we found that in the first seven days they stayed mostly near towers 11 and 29 in the city in the middle of the island and near tower 30 in the north of the island. After day seven, some of them moved to the south of the island. The following table provides details of the locations of callers.

 

ID

Calling from tower

0

From tower 7and in the evenings from 21

1

11 and 29

2

Mostly from 29, one call from 11

3

Mostly from 30, few from 10

5

Mostly from 30, few from 29

200

Mostly from 29, few from 28, 13 in the evenings

300

From 29 until about 6 PM on day 8, then from 17

306

From 30 on day 8, from 29 on day 9, from 12 on day 10

309

Quickly traveling between towers 7, 11, 29, 21, 22.
Stays near 22 on day 10.

360

Mostly near tower 30 on day 8, near 28 on day 9 and
tower 9 on day 10

397

Traveling from tower 20 to 3 through 29 on days 8 and 9.
On day 10 there is only one call, from 20.

 

A part of the procedure of gathering this information is captured in this video.

 

We have a weak suspicion that tower 30 is not where the map indicates, but somewhere near 28 and 29. However, we were unable to find strong enough evidence that would have allowed us to make such an important modification in the data.